The data set used to generate our research questions comes from the Kiva Crowdfunding, ‘Data science for good’ open data initiative. Kiva (2023) is a service aimed at providing small loans to the worlds unbanked population. This initiative was created so members of the public could help Kiva better understand the levels of poverty in areas where they had active loans (Kaggle, 2018). The data has a CC0: Public Domain licence meaning that we are free to use and distribute the data as we wish (Creative Commons, 2023) The data has been edited by a community member called ‘mfab’ which may reduce the reliability of the data, however as Kiva is the ‘Owner’ it is assumed that they have approved this editor and that the data remains reliable. The omission of the COVID-19 pandemic from the data is a notable limitation as the pandemic may have created changes in trends which would have made for interesting research.
The world_gdp data is reliable as it is collected and consolidated by the world bank, which is a global organisation run by the United Nations. It has a CC-BY 4.0 license allowing users to copy, modify and distribute data in any format for any purpose (World Bank, 2021).
(((The countries dataset from kaggle is not as reliable as an independent user uploaded the data. Some other kaggle users have expressed concern about the data’s reliability. Since only the country name and region data has been used, we believe that it is suitable for our uses in this report.))) - Feel free to cut out if too long
Some wrangling of data was required in order to use the datasets. Most wrangling of data was grouping to isolate variables for comparison. Functions such as mutate, rename and merge were also used to change variable values, rename columns and merge datasets together for the generation of our figures.
Potential stakeholders for this report would be those looking to potentially loan out money to others through services like Kiva, that want to know where their money is being sent and how it is being used. This report is a general overview about the types of people that apply for the loans, how much their loans are and how the loaned money is used.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.1 ✔ stringr 1.4.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tmap)
library(countrycode)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
kiva_loans <- read_csv("data/kiva_loans.csv") # read in Kiva data
## Rows: 671205 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): activity, sector, use, country_code, country, region, currency, t...
## dbl (6): id, funded_amount, loan_amount, partner_id, term_in_months, lende...
## dttm (3): posted_time, disbursed_time, funded_time
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s create our own country dataset.
Sources: - https://www.kaggle.com/datasets/juanumusic/countries-iso-codes - https://www.kaggle.com/datasets/fernandol/countries-of-the-world
NEED TO ELABORATE ON THESE SOURCES…
# Creating our own 'Countries' dataset from two other datasets
countries <- read_csv("data/countries.csv")
## Rows: 227 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): Country, Region, Pop. Density (per sq. mi.), Coastline (coast/area...
## dbl (3): Population, Area (sq. mi.), GDP ($ per capita)
## num (6): Infant mortality (per 1000 births), Literacy (%), Other (%), Clima...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
countries_clean <- countries %>%
clean_names() %>%
subset(select = c("country", "region"))
iso_codes <- read_csv("data/iso_codes.csv") %>%
rename(name = `English short name lower case`) %>%
rename(country_code = `Alpha-2 code`) %>%
subset(select = c("name", "country_code")) # only need the 3-digit code
## Rows: 246 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): English short name lower case, Alpha-2 code, Alpha-3 code, Numeric ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
iso_countries <- merge(countries_clean, iso_codes, by.x = "country", by.y = "name")
head(iso_countries)
## country region country_code
## 1 Afghanistan ASIA (EX. NEAR EAST) AF
## 2 Albania EASTERN EUROPE AL
## 3 Algeria NORTHERN AFRICA DZ
## 4 American Samoa OCEANIA AS
## 5 Andorra WESTERN EUROPE AD
## 6 Angola SUB-SAHARAN AFRICA AO
# Merging Countries dataset with Kiva loans
kiva_countries <- merge(kiva_loans, iso_countries, by.x = "country_code", by.y = "country_code")
##Is there are correlation between a country’s GDP per capita and its total loan sum?
To Answer this question an interactive scatter plot was drawn using the below code. Hovering above a data point will bring up the country’s name, its GDP per capita and its total loan sum. Both axis are logarithmic with a base of 10 in order to spread out the data so relationships can be drawn.
world_gdp <- read_csv("data/world_gdp.csv", skip = 4) # reads the file "world_gdp.csv" and assigns it as "world_gdp"
gdp_cleaned <- world_gdp %>%
rename(country = "Country Name") %>% # renames country
mutate(country = recode(country, `Congo, Rep.` = "Congo", `Congo, Dem. Rep.` = "The Democratic Republic of the Congo", `Cote d'Ivoire` = "Cote D'Ivoire", `Egypt, Arab Rep.` = "Egypt", `Kyrgyz Republic` = "Kyrgyzstan", `Lao PDR` = "Lao People's Democratic Republic", `Myanmar` = "Myanmar (Burma)", `West Bank and Gaza` = "Palestine", `St. Vincent and the Grenadines` = "Saint Vincent and the Grenadines", `Turks and Caicos Islands` = "Turkey", `Virgin Islands (U.S.)` = "Virgin Islands", `Yemen, Rep.` = "Yemen")) %>% # renames country names that differ from the kiva data set %>%
group_by(country) %>% # aggregates by country
rename(country_gdp = "2015") %>% # renames country GDP
summarise(country_gdp) # creates a new data frame
kiva_gdp <- kiva_loans %>%
group_by(country) %>%
summarise(sum(loan_amount))
kiva_gdp <- merge(kiva_gdp, gdp_cleaned, all.x = T, all.y = F) # merges the two data frames
colnames(kiva_gdp) <- c("Country", "Loan Sum", "GDP per Capita")
plot_kiva_gdp <- ggplot(kiva_gdp, aes(x = `GDP per Capita`, y = `Loan Sum`, country = `Country`)) +
geom_point(colour = "magenta") +
scale_x_continuous(trans = 'log10') +
scale_y_continuous(trans = 'log10') +
labs(x = "GDP pe capita (USD)", y = "Sum of Loans (USD)", title = "The total sum of Kiva loans against the GDP per capita for each country")
ggplotly(plot_kiva_gdp)
The GDP data in the scatter plot above utilities the GDP data from 2015. This was done to make a better comparison with the kiva loan data which has data from 2014 to 2017. The plot has clustering in the top left, which implies that countries with a lower GDP per capita account for the majority of loaned money. There is however no direct linear correlation between the total loaned amount and GDP per capita. It is important to note that the countries that with the highest total loan sum tend to be developing countries with a relatively low GDP per capita. Agriculture tends to be the largest sector in terms of their shares in GDP and employment for these sorts of countries (https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7d15e1f8e20c97dd4eeeccd2ad3f5e3e4255b831#:~:text=Agricultural%20Spending,shares%20in%20GDP%20and%20employment.). Which is interesting to note as it leads into the next research question which looks into how the funding is distributed per sector.
What is the distribution of funding between the Kiva designated sectors?
kiva_loans %>%
group_by(sector) %>%
summarise(total_funding = signif(sum(funded_amount)*10^(-6),5)) %>%
ggplot(aes(x=fct_reorder(sector, desc(total_funding)), y = total_funding)) + geom_col() + # Represents the data as a side by side bar chart with colour
labs(x = "Sector", y = "Total Funding (Million USD)", title = 'Total Kiva funding per Sector') +
theme_classic() + #Classic theme just an example or an idea of what can be done and how
geom_text(aes(label = total_funding), vjust = -0.5, size = 2) +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5, hjust=0.4))
Reviewing the above bar plot, we can see that agriculture, food and retail are by far the most funded Kiva sectors. For context, these three sectors receive double the amount of funding as the remaining 12. Interestingly, if we review the ‘Use’ column of the data frame it can also be observed that many loans categorised for retail are in fact loans to purchase food items such as Salt, Rice, Flour etc.
library(gt)
kiva_loans %>%
filter(sector == 'Food') %>%
select(use) %>%
slice(1:10) %>%
gt()
| use |
|---|
| To buy seasonal, fresh fruits to sell. |
| to purchase one buffalo. |
| to buy a stall, gram flour, ketchup, and coal for selling ladoo. |
| to buy ingredients to make bakery products. |
| to purchase vegetables, chicken, and oil to cook food to sell. |
| to purchase a variety of needed food items to prepare food to sell. |
| to purchase one cow. |
| to purchase a new, bigger-size cart. |
| to purchase sacks of tomatoes, potatoes, fruits, and green vegetables for resale |
| to buy meat and also to start selling fish in his butcher shop. |
The conclusions drawn from research Q1 inform us that Kiva loans are most typically requested in developing countries with low GDP’s, in these countries access to food is not a given and the creation of a constant food supply may be able to lift some out of poverty. In his 2015 report, Robert Townsend states that for the worlds poorest, growth in agriculture is two to four times more effective in raising living standards than growth in the next closest sector. This fact can create a win-win scenario for all stakeholders. So long as the loan is used wisely the fundees can create more value than they initially borrowed, and funders can see their investment amount to more than its dollar value. As such, it is not surprising that agriculture loans make up 27% of all Kiva loans.
This will need to be formatted properly
Townsend, R. (2015). Ending poverty and hunger by 2030 : an agenda for the global food system. Washington DC. Retrieved from https://documents.worldbank.org/en/publication/documents-reports/documentdetail/700061468334490682/ending-poverty-and-hunger-by-2030-an-agenda-for-the-global-food-system.
Insert text and analysis.
# Grouping agriculture loans by region
agriculture_by_region <- kiva_countries %>%
group_by(region = region.y) %>%
summarise(total_loans_sum = sum(loan_amount))
Summary:
ggplot(agriculture_by_region, aes(x = region, y = total_loans_sum)) +
geom_col(stat = "identity") +
theme(axis.text.x = element_text(angle=90),
axis.text.y = element_text(angle=90))
## Warning in geom_col(stat = "identity"): Ignoring unknown parameters: `stat`
What Is The Average Loan Amounts Per Gender in Each Region?
## Filtering out data which does not have either Male or Females in borrower_genders column.
genders_clean <- kiva_loans %>%
filter(!is.na(borrower_genders) & borrower_genders %in% c("male", "female"))
## Finding the mean funded amount for genders dependent on country
summary_aggregated <- genders_clean %>%
group_by(country, borrower_genders) %>%
summarize(mean_funded_amount = mean(funded_amount))
## Making columns and separating data
summary_aggregated <- summary_aggregated %>%
mutate(males = ifelse(borrower_genders == "male", mean_funded_amount, 0), females = ifelse(borrower_genders == "female", mean_funded_amount, 0))
## Synthesizing data
synthesized_data <- summary_aggregated %>%
group_by(country) %>%
summarize(male = sum(males), female = sum(females))
## Renaming a column
synthesized_data <- rename(synthesized_data, c("Country" = "country"))
countries_clean <- rename(countries_clean, c("Country" = "country"))
## Combining male and female data with countries data
countries_clean <- inner_join(countries_clean, synthesized_data, by = "Country")
## Regions Only
regions_data <- countries_clean %>%
group_by(region) %>%
summarize(male = sum(male), female = sum(female))
## Making it look nice
regions_data <- regions_data %>%
gather(key = "gender", value = "value", male, female)
## Plotting Side by Side Graph
ggplotly(ggplot(data = regions_data, aes(x = region, y = value, fill = gender)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "\nRegions", y = "Mean Loan Amount\n", title = "\n Mean Loan Amount Per Gender in Regions\n") +
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(face = "bold", colour = "red", size = 10),
axis.title.y = element_text(face = "bold", colour = "red", size = 10)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)))